Proceedings of the International Workshop on Expertise in Translation and Post-editing - Research and Application
نویسندگان
چکیده
s: Study of Electronic Pen Commands for Interactive-Predictive Machine Translation Vicent Alabau ([email protected]) Francisco Casacuberta ([email protected]) Institut Tecnològic d'Informatica, Universitat Politècnica de València, Camino de Vera s/n, 46022 València, Spain Typically, the post-editing of a machine translation (MT) output consists in performing a series of editing operations (ie., replace, delete, insert or move pieces of text) in a specific text editor using the keyboard and occasionally the mouse. This approach has been proved to be efficient by the translation industry to the point that [1] proposes post-editing guidelines for translation agencies. However, the user needs to be in front of a desktop computer which imposes some restrictions regarding where and how the work is to be done. Laptop computers can also be used, although arguably performance could be diminished because of the use of uncomfortable laptop keyboards and track pads. In this work, we envision an alternative scenario in which the user can use a touch screen or an electronic pen (e-pen) to perform post-editing tasks. I Although e-pen interaction may sound impractical for texts that need a large amount of post-editing, there is a number of circumstances where it can be more comfortable. First, it can be well suited for post-editing sentences with few errors, as it is the case of sentences with high fuzzy matches, or the revision of human post-edited sentences. Second, it would allow to perform such tasks while commuting, traveling or sitting comfortably on the couch in the living room. There is already a ‘de facto’ standard for gestures for proof reading (cf. Figure 1) from which we have extracted the most promising gestures: substitutions, deletions, insertions and, transpositions. Furthermore, we have added a shift gesture to move phrases to specific places in the text (i.e., the user circles the phrase and draws an arrow to the final destination). Then, we have studied two e pen post-editing approaches. In the first one, we consider substitutions, deletions, insertions and, shifts. The number of these operations to obtain a reference can be computed with the translation error rate (TER) [2]. In the second approach, we assume that the user is working with an interactivepredictive MT system (IMT) [3]. In IMT, the user and the MT system collaborate to produce a high-quality output. The user locates the first error from left-to-right and amends it. Then, leveraging the recently validated text, the system reformulates (predicts) the continuation of the translation aiming to improve the previous hypothesis. In this case, we have also considered transpositions. To know what gestures could be more useful, we have conducted an experiment on the Xerox corpus [4]. The Xerox corpus consists of a collection of technical manuals. It consists of 56k sentences of training and a development and test sets of 1.1k sentences. Test perplexities for Spanish and English are 35 and 51, respectively. The summary of the edit rate results is displayed in Table 1. The edit rate is the number of edit operations needed to obtain the reference normalized by the number of words. We can see that the IMT system requires less interactions, especially for es en. Next, the number of times a particular edit operation has been applied is shown. We expect the gestures for deletion, insertion, shifting and transposition to be easy to tell apart for a machine learning algorithm. However, this will be the subject of future work. In addition, substitutions or insertions require the user to write the correct word, which can be done with a virtual keyboard or by handwriting [5]. The perplexities for these words is 336 for English and 242 for Spanish, whereas the errors rates for handwriting recognition are 7.4 for English and 8.9 for Spanish. References [1] TAUS in partnership with CNGL. Post-editing guidelines, 2011. [2] Matthew Snover, Bonnie Dorr, Richard Schwartz, Linnea Micciulla, and John Makhoul. A study of translation edit rate with targeted human annotation. In In Proceedings of Association for Machine Translation in the Americas, pages 223–231, 2006. [3] Sergio Barrachina, Oliver Bender, Francisco Casacuberta, Jorge Civera, Elsa Cubel, Shahram Khadivi, Antonio L. Lagarda, Hermann Ney, Jesús Tomás, and Enrique Vidal. Statistical approaches to computer-assisted translation. Computational Linguistics, 35(1):3–28, 2009. [4] SchulmbergerSema S.A., Instituto Técnico de Informática, R.W.T.H. Aachen – Lehrstuhl für Informatik VI, R.A.L.I. Laboratory – University of Montreal, Celer Soluciones, Société Gamma, and Xerox Research Centre Europe. X.R.C.: TT2. TransType2 – Computer assisted translation. Project technical annex, 2001. [5] Vicent Alabau, Alberto Sanchis, and Francisco Casacuberta. Improving online handwritten recognition using translation models in multimodal interactive machine translation. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pages 389–394. Association for Computational Linguistics, 2011. The CRITT database of TPR data Laura Winther Balling & Michael Carl Copenhagen Business School We introduce the CRITT database of translation process data which we are currently building as a publicly available database for translation process research (TPR). The database contains data from experiments conducted over the last five years at the CRITT centre at Copenhagen Business School and other places, and is used for Translation Process Research. The data is stored in a consistent format and includes both process and product data. The process data are data about keystrokes obtained with Translog or other process data collection devices as well as (where available) eye-tracking data. The product data include tokenized and aligned versions of the source and target texts and their annotations in the Treex format. Additionally, we include a whole range of characteristics of the texts, languages, translation settings, and translators, which may be used as predictors in large-scale statistical investigations of the translation process. The data has the potential to give us tremendous insight into the similarities and differences of translators' performance, within and between different languages and texts. In addition, a main objective is to be able to investigate different translation settings, comparing from-scratch translation with post-editing (where the source text is available) and editing (where the source text
منابع مشابه
Needs assessment application and evaluation of nursing and midwifery continuing education program during the years 1999 to 2000 at Shiraz University of Medical Sciences
Introduction. The necessity of making improvement on nurse’s knowledge and expertise has led nursing executives of continuing education to evaluate programs. Application of continuing education program should not only be limited to taking part in programs but also it should include evaluation mechanism for measuring the extent of participant’s progress. This pre–post test study was arranged to ...
متن کاملHow to Work Collaboratively Within the Health System: Workshop Summary and Facilitator Reflection
Effectiveness in health services research requires development of specific knowledge and skills for working in partnership with health system decision-makers. In an initial effort to frame capacity-building activities for researchers, we designed a workshop on working collaboratively within the health system. The workshop, based on recent research exploring health syste...
متن کاملTranslation of Acronyms, Initialisms and Abbreviations (AIA) in Persian Political and Sport Journalistic Texts
The different writing systems of English and Persian makes translation of acronyms, initialisms and abbreviations challenging. This study aimed at finding which strategies were applied most frequently in translating acronyms, initialisms and abbreviations from English to Persian especially in journalistic texts. The study was done based n Descriptive Translation Study of Toury and strategies pr...
متن کاملTranslation Strategies of Culture-Specific Items from English to Persian in Translation of "Othello"
This study investigated the translation strategies of culture-specific items in translation of 'Othello' by William Shakespeare into Persian by Abdolhossein Nooshin. First, the English culture-specific items and their corresponding translations were identified. Then, the frequency of the strategies used by the translator according to Newmark's translation model and Venuti's domestication and fo...
متن کاملThe Analysis of Style in Persian Translations of Pink Floyd Lyrics
The existence of cultural gaps causes problems in translation process and harms the aesthetic aspects of texts. In this paper, it was tried to recognize the factors which cause inconsistency between the style of original lyrics and its translations and also find out which translation strategies increase the rate of this distortion. The framework of the study was translation quality assessment b...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2012